AITopics | noise suppression

Collaborating Authors

noise suppression

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DroneAudioset: An Audio Dataset for Drone-based Search and Rescue

Gupta, Chitralekha, Ramesh, Soundarya, Sasikumar, Praveen, Yeo, Kian Peen, Nanayakkara, Suranga

arXiv.org Artificial IntelligenceOct-20-2025

Unmanned Aerial Vehicles (UAVs) or drones, are increasingly used in search and rescue missions to detect human presence. Existing systems primarily leverage vision-based methods which are prone to fail under low-visibility or occlusion. Drone-based audio perception offers promise but suffers from extreme ego-noise that masks sounds indicating human presence. Existing datasets are either limited in diversity or synthetic, lacking real acoustic interactions, and there are no standardized setups for drone audition. To this end, we present DroneAudioset (The dataset is publicly available at https://huggingface.co/datasets/ahlab-drone-project/DroneAudioSet/ under the MIT license), a comprehensive drone audition dataset featuring 23.5 hours of annotated recordings, covering a wide range of signal-to-noise ratios (SNRs) from -57.2 dB to -2.5 dB, across various drone types, throttles, microphone configurations as well as environments. The dataset enables development and systematic evaluation of noise suppression and classification methods for human-presence detection under challenging conditions, while also informing practical design considerations for drone audition systems, such as microphone placement trade-offs, and development of drone noise-aware audio processing. This dataset is an important step towards enabling design and deployment of drone-audition systems.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.15383

Country:

Asia > Singapore (0.04)
Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)
Health & Medicine (0.87)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

A Unified Cortical Circuit Model with Divisive Normalization and Self-Excitation for Robust Representation and Memory Maintenance

Su, Jie, Wang, Weiwei, Gu, Zhaotian, Wang, Dahui, Qian, Tianyi

arXiv.org Artificial IntelligenceAug-19-2025

Robust information representation and its persistent maintenance are fundamental for higher cognitive functions. Existing models employ distinct neural mechanisms to separately address noise-resistant processing or information maintenance, yet a unified framework integrating both operations remains elusive -- a critical gap in understanding cortical computation. Here, we introduce a recurrent neural circuit that combines divisive normalization with self-excitation to achieve both robust encoding and stable retention of normalized inputs. Mathematical analysis shows that, for suitable parameter regimes, the system forms a continuous attractor with two key properties: (1) input-proportional stabilization during stimulus presentation; and (2) self-sustained memory states persisting after stimulus offset. We demonstrate the model's versatility in two canonical tasks: (a) noise-robust encoding in a random-dot kinematogram (RDK) paradigm; and (b) approximate Bayesian belief updating in a probabilistic Wisconsin Card Sorting Test (pWCST). This work establishes a unified mathematical framework that bridges noise suppression, working memory, and approximate Bayesian inference within a single cortical microcircuit, offering fresh insights into the brain's canonical computation and guiding the design of biologically plausible artificial neural architectures.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2508.12702

Country:

North America > United States > Wisconsin (0.24)
Asia > China > Beijing > Beijing (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

CheapNET: Improving Light-weight speech enhancement network by projected loss function

Tan, Kaijun, Dai, Benzhe, Li, Jiakui, Mao, Wenyu

arXiv.org Artificial IntelligenceNov-27-2023

Noise suppression and echo cancellation are critical in speech enhancement and essential for smart devices and real-time communication. Deployed in voice processing front-ends and edge devices, these algorithms must ensure efficient real-time inference with low computational demands. Traditional edge-based noise suppression often uses MSE-based amplitude spectrum mask training, but this approach has limitations. We introduce a novel projection loss function, diverging from MSE, to enhance noise suppression. This method uses projection techniques to isolate key audio components from noise, significantly improving model performance. For echo cancellation, the function enables direct predictions on LAEC pre-processed outputs, substantially enhancing performance. Our noise suppression model achieves near state-of-the-art results with only 3.1M parameters and 0.4GFlops/s computational load. Moreover, our echo cancellation model outperforms replicated industry-leading models, introducing a new perspective in speech enhancement.

enhancement, loss function, speech enhancement, (15 more...)

arXiv.org Artificial Intelligence

2311.15959

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Speech (0.67)

Add feedback

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

Wang, Xiaofei, Thakker, Manthan, Chen, Zhuo, Kanda, Naoyuki, Eskimez, Sefik Emre, Chen, Sanyuan, Tang, Min, Liu, Shujie, Li, Jinyu, Yoshioka, Takuya

arXiv.org Artificial IntelligenceAug-13-2023

Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However, existing models still face limitations in handling diverse audio-text speech generation tasks involving transforming input speech and processing audio captured in adverse acoustic conditions. This paper introduces SpeechX, a versatile speech generation model capable of zero-shot TTS and various speech transformation tasks, dealing with both clean and noisy signals. SpeechX combines neural codec language modeling with multi-task learning using task-dependent prompting, enabling unified and extensible modeling and providing a consistent way for leveraging textual input in speech enhancement and transformation tasks. Experimental results show SpeechX's efficacy in various tasks, including zero-shot TTS, noise suppression, target speaker extraction, speech removal, and speech editing with or without background noise, achieving comparable or superior performance to specialized models across tasks. See https://aka.ms/speechx for demo samples.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2308.06873

Country: North America > United States > Washington > King County > Redmond (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Egocentric Audio-Visual Noise Suppression

Sharma, Roshan, He, Weipeng, Lin, Ju, Lakomkin, Egor, Liu, Yang, Kalgaonkar, Kaustubh

arXiv.org Artificial IntelligenceMay-2-2023

This paper studies audio-visual noise suppression for egocentric videos -- where the speaker is not captured in the video. Instead, potential noise sources are visible on screen with the camera emulating the off-screen speaker's view of the outside world. This setting is different from prior work in audio-visual speech enhancement that relies on lip and facial visuals. In this paper, we first demonstrate that egocentric visual information is helpful for noise suppression. We compare object recognition and action classification-based visual feature extractors and investigate methods to align audio and visual representations. Then, we examine different fusion strategies for the aligned features, and locations within the noise suppression model to incorporate visual information. Experiments demonstrate that visual features are most helpful when used to generate additive correction masks. Finally, in order to ensure that the visual features are discriminative with respect to different noise types, we introduce a multi-task learning framework that jointly optimizes audio-visual noise suppression and video-based acoustic event detection. This proposed multi-task framework outperforms the audio-only baseline on all metrics, including a 0.16 PESQ improvement. Extensive ablations reveal the improved performance of the proposed model with multiple active distractors, overall noise types, and across different SNRs.

artificial intelligence, image understanding, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2211.03643

Country: North America > United States (0.04)

Genre: Research Report (0.70)

Industry: Media (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.80)

Add feedback

Noise Suppression Based on Neurophysiologically-motivated SNR Estimation for Robust Speech Recognition

Neural Information Processing SystemsApr-6-2023, 17:08:49 GMT

For SNR-estimation, the input signal is transformed into so-called Amplitude Modulation Spectrograms (AMS), which rep(cid:173) resent both spectral and temporal characteristics of the respective analysis frame, and which imitate the representation of modula(cid:173) tion frequencies in higher stages of the mammalian auditory sys(cid:173) tem. A neural network is used to analyse AMS patterns generated from noisy speech and estimates the local SNR. Noise suppres(cid:173) sion is achieved by attenuating frequency channels according to their SNR. The noise suppression algorithm is evaluated in speaker(cid:173) independent digit recognition experiments and compared to noise suppression by Spectral Subtraction.

artificial intelligence, machine learning, neurophysiologically-motivated snr estimation, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

Time-Variance Aware Real-Time Speech Enhancement

Zheng, Chengyu, Zhou, Yuan, Peng, Xiulian, Zhang, Yuan, Lu, Yan

arXiv.org Artificial IntelligenceFeb-25-2023

Time-variant factors often occur in real-world full-duplex communication applications. Some of them are caused by the complex environment such as non-stationary environmental noises and varying acoustic path while some are caused by the communication system such as the dynamic delay between the far-end and near-end signals. Current end-to-end deep neural network (DNN) based methods usually model the time-variant components implicitly and can hardly handle the unpredictable time-variance in real-time speech enhancement. To explicitly capture the time-variant components, we propose a dynamic kernel generation (DKG) module that can be introduced as a learnable plug-in to a DNN-based end-to-end pipeline. Specifically, the DKG module generates a convolutional kernel regarding to each input audio frame, so that the DNN model is able to dynamically adjust its weights according to the input signal during inference. Experimental results verify that DKG module improves the performance of the model under time-variant scenarios, in the joint acoustic echo cancellation (AEC) and deep noise suppression (DNS) tasks.

artificial intelligence, machine learning, module, (18 more...)

arXiv.org Artificial Intelligence

2302.13063

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Microsoft Teams can now suppress background noise using machine learning

#artificialintelligenceJan-26-2022, 11:47:34 GMT

Microsoft has a couple of new audio features in the works for Teams. Machine-learning-based noise suppression can automatically detect background noise that needs to be suppressed, allowing the communication app to make speech audio clearer within meetings. An automatic music detection feature is also on the way to Teams, though people will have to wait a few months to try it out. The mode reduces bitrate by four times compared to lossless encoding. Microsoft built the new feature with music lessons and performances in mind.

high-fidelity music mode, microsoft team, suppress background noise, (8 more...)

#artificialintelligence

Country: Europe (0.18)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.92)

Add feedback

End-to-End Complex-Valued Multidilated Convolutional Neural Network for Joint Acoustic Echo Cancellation and Noise Suppression

Watcharasupat, Karn N., Nguyen, Thi Ngoc Tho, Gan, Woon-Seng, Zhao, Shengkui, Ma, Bin

arXiv.org Artificial IntelligenceOct-11-2021

Echo and noise suppression is an integral part of a full-duplex communication system. Many recent acoustic echo cancellation (AEC) systems rely on a separate adaptive filtering module for linear echo suppression and a neural module for residual echo suppression. However, not only do adaptive filtering modules require convergence and remain susceptible to changes in acoustic environments, but this two-stage framework also often introduces unnecessary delays to the AEC system when neural modules are already capable of both linear and nonlinear echo suppression. In this paper, we exploit the offset-compensating ability of complex time-frequency masks and propose an end-to-end complex-valued neural network architecture. The building block of the proposed model is a pseudocomplex extension based on the densely-connected multidilated DenseNet (D3Net) building block, resulting in a very small network of only 354K parameters. The architecture utilized the multi-resolution nature of the D3Net building blocks to eliminate the need for pooling, allowing the network to extract features using large receptive fields without any loss of output resolution. We also propose a dual-mask technique for joint echo and noise suppression with simultaneous speech enhancement. Evaluation on both synthetic and real test sets demonstrated promising results across multiple energy-based metrics and perceptual proxies.

enhancement, loopback signal, proc, (14 more...)

arXiv.org Artificial Intelligence

2110.00745

Country: Asia > Singapore (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

Add feedback

Digital stethoscope uses artificial intelligence for diagnosing lung abnormalities

#artificialintelligenceDec-8-2020, 01:00:37 GMT

Stethoscopes are a ubiquitous and cost-effective tool for medical diagnosis, but they open the door to subjectivity and can experience high levels of environmental noise. This makes it difficult to properly diagnose lung abnormalities, like COVID-19, by listening to sounds from the body. James West, at Johns Hopkins University, has been developing a digital stethoscope equipped with artificial intelligence for accurate lung diagnoses. He will discuss its opportunities and obstacles at the 179th ASA Meeting.

artificial intelligence, digital stethoscope use artificial intelligence, lung abnormality, (8 more...)

#artificialintelligence

Country: North America > United States > New York > Suffolk County > Melville (0.06)

Genre: Press Release (0.39)

Industry:

Health & Medicine > Health Care Equipment & Supplies (0.88)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.38)
Health & Medicine > Therapeutic Area > Immunology (0.38)

Technology: Information Technology > Artificial Intelligence (0.62)

Add feedback